Automatic Paleographic Exploration of Genizah Manuscripts

نویسندگان

  • Lior Wolf
  • Nachum Dershowitz
  • Liza Potikha
  • Tanya German
  • Roni Shweka
  • Yaacov Choueka
چکیده

The Cairo Genizah is a collection containing approximately 250,000 hand-written fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, and there is an ongoing effort to document and catalogue all extant fragments. Paleographic information plays a key role in the study of the Genizah collection. Script style, and – more specifically – handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah document by script style, without being provided any prior information about the relevant styles. The results show that the automatically obtained grouping agrees, for the most part, with the paleographic taxonomy. In cases where the system fails, it is due to apparent similarities between related scripts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computerized Paleography Exploration of Historical Manuscripts

The modern scholar of history or of other disciplines is often faced today with hundreds of thousands of readily-available and potentially-relevant full or fragmentary documents, but without computer aids it is a very hard and usually even impossible task to find the sought-after needles in the proverbial haystack of online images. The Cairo Genizah is a collection containing approximately 250,...

متن کامل

Automatic extraction of catalog data from digital images of historical manuscripts

The Cairo Genizah, discovered in the late 19th century, is a collection of handwritten historical documents containing approximately 350,000 fragments of mainly Jewish texts. The fragments are today spread out in more than seventy libraries and private collections worldwide, and there is an ongoing effort to document and catalog all extant fragments. We explore three levels of extraction of cat...

متن کامل

Enriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge

This paper describes an on-going project of transcribing and annotating digitized manuscripts of medieval Spanish with paleographic and lexical information. We link lexical units from the manuscripts with the Multilingual Central Repository (MCR), making terms retrievable by any of the languages that integrate MCR. The goal of the project is twofold: creating a paleographic knowledge base from ...

متن کامل

SPI: A System for Paleographic Inspections

The main interest in paleographers work is to relate the culture and the writing styles of ancient manuscripts analysing the morphology of scripts. Unfortunately, often experts disagree on the analysis methods. For this reason, an user-indipendent system based on statistical methods can be very helpful for experts on determining which morphological features are relevant for the description of t...

متن کامل

Citation and Alignment: Scholarship Outside and Inside the Codex

We describe a hierarchical approach to modeling text that allows machine-actionable canonical citation of text at many levels of specificity. This model address the problem of overlapping or mutually exclusive analyses. In turn, this flexibility in citation allows rich linking of textual transcriptions and other data to regions-of-interest on digital images, of particular value to codicological...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010